Search CORE

22 research outputs found

Many-core applications to online track reconstruction in HEP experiments

Author: Amerio S.
Bastieri D.
Corvo M.
Gianelle A.
Ketchum W.
Liu T.
Lonardo A.
Lucchesi D.
Poprocki S.
Rivera R.
Tosoratto L.
Vicini P.
Wittich P.
Publication venue: 'IOP Publishing'
Publication date: 11/11/2013
Field of study

Interest in parallel architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of Graphic Processing Units (GPUs) and Intel Many Integrated Core architecture (MIC) when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-up version of the algorithm used at CDF experiment at Tevatron for online track reconstruction - the SVT algorithm - as a realistic test-case for low-latency trigger systems using new computing architectures for LHC experiment. We examine the complexity/performance trade-off in porting existing serial algorithms to many-core devices. Measurements of both data processing and data transfer latency are shown, considering different I/O strategies to/from the parallel devices.Comment: Proceedings for the 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP); missing acks adde

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Ferrara

NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features

Author: Ameli F.
Ammendola R.
Biagioni A.
Frezza O.
Lamanna G.
Lo Cicero F.
Lonardo A.
Martinelli M.
Nicolau C.
Paolucci P.S.
Pastorelli E.
Pontisso L.
Rossetti D.
Simeone F.
Simula F.
Sozzi M.
Tosoratto L.
Vicini P.
Publication venue
Publication date: 13/06/2014
Field of study

While the GPGPU paradigm is widely recognized as an effective approach to high performance computing, its adoption in low-latency, real-time systems is still in its early stages. Although GPUs typically show deterministic behaviour in terms of latency in executing computational kernels as soon as data is available in their internal memories, assessment of real-time features of a standard GPGPU system needs careful characterization of all subsystems along data stream path. The networking subsystem results in being the most critical one in terms of absolute value and fluctuations of its response latency. Our envisioned solution to this issue is NaNet, a FPGA-based PCIe Network Interface Card (NIC) design featuring a configurable and extensible set of network channels with direct access through GPUDirect to NVIDIA Fermi/Kepler GPU memories. NaNet design currently supports both standard - GbE (1000BASE-T) and 10GbE (10Base-R) - and custom - 34~Gbps APElink and 2.5~Gbps deterministic latency KM3link - channels, but its modularity allows for a straightforward inclusion of other link technologies. To avoid host OS intervention on data stream and remove a possible source of jitter, the design includes a network/transport layer offload module with cycle-accurate, upper-bound latency, supporting UDP, KM3link Time Division Multiplexing and APElink protocols. After NaNet architecture description and its latency/bandwidth characterization for all supported links, two real world use cases will be presented: the GPU-based low level trigger for the RICH detector in the NA62 experiment at CERN and the on-/off-shore data link for KM3 underwater neutrino telescope

arXiv.org e-Print Archive

CERN Document Server

High-speed data transfer with FPGAs and QSFP+ modules

Author: A Biagioni
A Lonardo
A Salamon
D Rossetti
F Lo Cicero
F Simula
G Chiodi
G Salina
L Tosoratto
O Frezza
P S Paolucci
P Vicini
R Ammendola
R Lunadei
R. Ammendola .
R. Ammendola .
Publication venue: 'IOP Publishing'
Publication date: 01/03/2011
Field of study

We present test results and characterization of a data transmission system based on a last generation FPGA and a commercial QSFP+ (Quad Small Form Pluggable +) module. QSFP+ standard defines a hot-pluggable transceiver available in copper or optical cable assemblies for an aggregated bandwidth of up to 40 Gbps. We implemented a complete testbench based on a commercial development card mounting an Altera Stratix IV FPGA with 24 serial transceivers at 8.5 Gbps, together with a custom mezzanine hosting three QSFP+ modules. We present test results and signal integrity measurements up to an aggregated bandwidth of 12 Gbps.Comment: 5 pages, 3 figures, Published on JINST Journal of Instrumentation proceedings of Topical Workshop on Electronics for Particle Physics 2010, 20-24 September 2010, Aachen, Germany(R Ammendola et al 2010 JINST 5 C12019

arXiv.org e-Print Archive

Crossref

APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters

Author: A Biagioni
A Lonardo
A Salamon
Ammendola R
Ammendola R
Ammendola R
Ammendola R
Ammendola R
Bodin F
Chalasani Suresh
D Rossetti
F Lo Cicero
F Simula
G Salina
L Tosoratto
NVIDIA Corporation
O Prezza
P S Paolucci
P Vicini
Paolucci P S
Paolucci P S
R Ammendola
Publication venue: 'IOP Publishing'
Publication date: 18/02/2011
Field of study

We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable cluster network architecture. Some test results and characterization of data transmission of a complete testbench, based on a commercial development card mounting an Altera FPGA, are provided.Comment: 6 pages, 7 figures, proceeding of CHEP 2010, Taiwan, October 18-2

arXiv.org e-Print Archive

Crossref

Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network

Author: A Biagioni
A Lonardo
D Rossetti
F Lo Cicero
F Simula
L Tosoratto
O Frezza
P S Paolucci
P Vicini
R Ammendola A
Publication venue
Publication date: 06/06/2014
Field of study

APEnet+ is an INFN (Italian Institute for Nuclear Physics) project aiming to develop a custom 3-Dimensional torus interconnect network optimized for hybrid clusters CPU-GPU dedicated to High Performance scientific Computing. The APEnet+ interconnect fabric is built on a FPGA-based PCI-express board with 6 bi-directional off-board links showing 34 Gbps of raw bandwidth per direction, and leverages upon peer-to-peer capabilities of Fermi and Kepler-class NVIDIA GPUs to obtain real zero-copy, GPU-to-GPU low latency transfers. The minimization of APEnet+ transfer latency is achieved through the adoption of RDMA protocol implemented in FPGA with specialized hardware blocks tightly coupled with embedded microprocessor. This architecture provides a high performance low latency offload engine for both trasmit and receive side of data transactions: preliminary results are encouraging, showing 50% of bandwidth increase for large packet size transfers. In this paper we describe the APEnet+ architecture, detailing the hardware implementation and discuss the impact of such RDMA specialized hardware on host interface latency and bandwidth

Open Access Repository

NaNet3: The on-shore readout and slow-control board for the KM3NeT-Italia underwater neutrino telescope

Author: Ameli F.
Ammendola R.
Biagioni A.
Frezza O.
Lo Cicero F.
Lonardo A.
Martinelli M.
Nicolau C.A.
Paolucci P.S.
Pastorelli E.
Pontisso L.
Simeone F.
Simula F.
Tosoratto L.
Vicini P.
Publication venue: EDP Sciences
Publication date: 01/01/2016
Field of study

The KM3NeT-Italia underwater neutrino detection unit, the tower, consists of 14 floors. Each floor supports 6 Optical Modules containing front-end electronics needed to digitize the PMT signal, format and transmit the data and 2 hydrophones that reconstruct in real-time the position of Optical Modules, for a maximum tower throughput of more than 600 MB/s. All floor data are collected by the Floor Control Module (FCM) board and transmitted by optical bidirectional virtual point-to-point connections to the on-shore laboratory, each FCM needing an on-shore counterpart as communication endpoint. In this contribution we present NaNet3, an on-shore readout board based on Altera Stratix V GX FPGA able to manage multiple FCM data channels with a capability of 800 Mbps each. The design is a NaNet customization for the KM3NeT-Italia experiment, adding support in its I/O interface for a synchronous link protocol with deterministic latency at physical level and for a Time Division Multiplexing protocol at data level

Directory of Open Access Journals

Applications of many-core technologies to on-line event reconstruction in High Energy Physics experiments2013 IEEE Nuclear Science Symposium and Medical Imaging Conference (2013 NSS/MIC)

Author: Amerio S.
Bastieri D.
Corvo M.
Gianelle A.
Ketchum W.
Liu T.
Lonardo A.
Lucchesi D.
Poprocki S.
Rivera R.
Tosoratto L.
Vicini P.
Wittich P.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Interest in many-core architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of many-core devices when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-up version of the algorithm used at CDF experiment at Tevatron for online track reconstruction - the SVT algorithm - as a realistic test-case for low-latency trigger systems using new computing architectures for LHC experiment. We examine the complexity/performance trade-off in porting existing serial algorithms to many-core devices. We measure performance of different architectures (Intel Xeon Phi and AMD GPUs, in addition to NVidia GPUs) and different software environments (OpenCL, in addition to NVidia CUDA). Measurements of both data processing and data transfer latency are shown, considering different I/O strategies to/from the many-core devices

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Ferrara

NaNet: A flexible and configurable low-latency NIC for real-time trigger systems based on GPUs

Author: Ammendola R.
Biagioni A.
Cicero F. L.o.
Frezza O.
LAMANNA GIANLUCA
Lonardo A
Pantaleo F.
Paolucci P. S.
Rossetti D.
Simula F.
SOZZI MARCO STANISLAO
Tosoratto L.
Vicini P.
Publication venue: 'IOP Publishing'
Publication date: 01/01/2014
Field of study

NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandwidth for GbE and APElink channels are presented, followed by a performance analysis on the case study of the GPU-based low level trigger for the RICH detector in the NA62 CERN experiment, using either the NaNet GbE and APElink channels. Finally, we give an outline of project future activities.© CERN 2014

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

NaNet

Author: A. Biagioni
A. Lonardo
C.A. Nicolau
E. Pastorelli
F. Ameli
F. Lo Cicero
F. Simeone
F. Simula
L. Pontisso
L. Tosoratto
M. Martinelli
O. Frezza
P. Vicini
P.S. Paolucci
R. Ammendola
Publication venue: 'EDP Sciences'
Publication date: 01/01/2016
Field of study

EDP Sciences OAI-PMH repository (1.2.0)

Open Access Repository

NaNet: a flexible and configurable low-latency NIC for real-time trigger systems based on GPUs

Author: Ammendola R.
Biagioni A.
Frezza O.
Lamanna G.
Lo Cicero F.
Lonardo A.
Pantaleo F.
Paolucci P.S.
Rossetti D.
Simula F.
Sozzi M.
Tosoratto L.
Vicini P.
Publication venue
Publication date: 15/11/2013
Field of study

NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34~Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandwidth for GbE and APElink channels are presented, followed by a performance analysis on the case study of the GPU-based low level trigger for the RICH detector in the NA62 CERN experiment, using either the NaNet GbE and APElink channels. Finally, we give an outline of project future activities.NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandwidth for GbE and APElink channels are presented, followed by a performance analysis on the case study of the GPU-based low level trigger for the RICH detector in the NA62 CERN experiment, using either the NaNet GbE and APElink channels. Finally, we give an outline of project future activities.NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandwidth for GbE and APElink channels are presented, followed by a performance analysis on the case study of the GPU-based low level trigger for the RICH detector in the NA62 CERN experiment, using either the NaNet GbE and APElink channels. Finally, we give an outline of project future activities

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

CERN Document Server